Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate xpack-metricbeat #38081

Merged
merged 82 commits into from
Mar 12, 2024
Merged

Conversation

sharbuz
Copy link
Contributor

@sharbuz sharbuz commented Feb 21, 2024

What is the problem this PR solves?

Jenkins->Buildkite pipelines migration

Migrate x-pack metricbeat pipeline

Example of the pipelines:
https://buildkite.com/elastic/beats-xpack-metricbeat/builds/623

Examples of the affected pipelines:
beats-libbeat
beats-metricbeat
beats-packetbeat
beats-winlogbeat
beats-xpack-libbeat

Related issues

https://github.com/elastic/ingest-dev/issues/1693
https://github.com/elastic/ingest-dev/issues/2993 - related to the mentioned inside the PR issue

@sharbuz sharbuz added Packetbeat libbeat Metricbeat Metricbeat x-pack Issues and pull requests for X-Pack features. macOS Enable builds in the CI for darwin testing aws Enable builds in the CI for aws cloud testing backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.12.0 Automated backport with mergify backport-v8.13.0 Automated backport with mergify labels Feb 21, 2024
@sharbuz sharbuz self-assigned this Feb 21, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 21, 2024
@botelastic
Copy link

botelastic bot commented Feb 21, 2024

This pull request doesn't have a Team:<team> label.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 21, 2024

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 13 min 20 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)


cleanup() {
echo "---Terraform Cleanup"
.ci/scripts/terraform-cleanup.sh "${MODULE_DIR}" #TODO: move all docker-compose files from the .ci to .buildkite folder before switching to BK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, there is a missing context what's the reason for running this terraform clean up function. A comment can help in the future.
In addition, should this particular script run only if the stage cloud ran for the x-pack/metricbeat? Or will it work regardless of the stages in all the beats in case they call the cleanup function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these functions will be run only when it's needed, for example: https://github.com/sharbuz/beats/blob/25547b4ef8c75e9c356573b39e0bc25c0bf5e022/.buildkite/scripts/cloud_tests.sh#L7
It will run when the script finishes with EXIT code, which means when it finishes successfully/unsuccessfully

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand the trap function.
My main concern is regarding the cleanup function itself. If the pipeline finishes with exit 0, then cleanup will run, but if it fails, will the failure be reported, too? I'm not sure if that's what we want. Post-build steps for tearing down resources should not fail the build but make it unstable or notify the cloud resources manager with a message, IMO.

My team enabled a cloud-reaper mechanism that runs async to delete any leftovers in any of the cloud providers we use. That's how we can avoid having stalled resources and failing builds if the cleanup failed.

Copy link
Contributor Author

@sharbuz sharbuz Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got what you mean, In this case I would suggest using the additional "cleanup" step with Slack notification or pipeline with email notification. @v1v WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To illustrate a bit more, the current implementation in Jenkins does not fail when tearing down the resources with terraform/docker:

  • beats/Jenkinsfile

    Lines 945 to 950 in 3f46222

    dirs?.each { folder ->
    // If it failed then cleanup without failing the build
    sh(label: 'Terraform Cleanup', script: ".ci/scripts/terraform-cleanup.sh ${folder}", returnStatus: true)
    }
    // Cleanup the docker services
    sh(label: 'Docker Compose Cleanup', script: ".ci/scripts/docker-services-cleanup.sh", returnStatus: true)

It uses returnStatus: true that means do nothing, see

returnStatus : boolean (optional)
Normally, a script which exits with a nonzero status code will cause the step to fail with an exception. If this option is checked, the return value of the step will instead be the status code. You may then compare it to zero, for example.

See https://www.jenkins.io/doc/pipeline/steps/workflow-durable-task-step/#sh-shell-script

You can get the same:
either you can || true or use your additional cleanup with soft_fail in Buildkite.

WDYT?

Sounds good to me. But let me ask @dliappis about his preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: f432822

.buildkite/scripts/common.sh Show resolved Hide resolved
@sharbuz sharbuz requested a review from v1v March 11, 2024 10:32
Copy link
Member

@v1v v1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few open discussions without an explicit answer or action, hence I'm blocking this.

.buildkite/scripts/common.sh Show resolved Hide resolved
@@ -43,6 +43,10 @@ with_dependencies
config_git
mage dumpVariables

if [[ "$BUILDKITE_PIPELINE_SLUG" == "beats-xpack-metricbeat" && "${BUILDKITE_STEP_KEY}" == "extended-cloud-test" ]]; then
startCloudTestEnv "${MODULE_DIR}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for using this approach is to collect all needed dependencies/tools. When it's done we'll think about using our own BK images with one/two/three commands in the command step. For example:

command: "cd $BEATS_PROJECT_NAME && mage package"

Comment on lines 484 to 486
if [[ "$BUILDKITE_PIPELINE_SLUG" == "beats-xpack-metricbeat" ]]; then
withModule "${MODULE_DIR}"
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered here

exportVars
export RACE_DETECTOR="true"
export TEST_COVERAGE="true"
export DOCKER_PULL="0"
export TEST_TAGS="oracle"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed/changed

.buildkite/scripts/setenv.sh Outdated Show resolved Hide resolved
.buildkite/hooks/pre-command Show resolved Hide resolved
@@ -43,6 +43,10 @@ with_dependencies
config_git
mage dumpVariables

if [[ "$BUILDKITE_PIPELINE_SLUG" == "beats-xpack-metricbeat" && "${BUILDKITE_STEP_KEY}" == "extended-cloud-test" ]]; then
startCloudTestEnv "${MODULE_DIR}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested your suggestions and I faced some issues and I decided to revert my changes. I will back to this on Phase 2.

  1. If it's placed in the cloud_tests.sh it doesn't work because the metricbeat-pipeline step requires the defined MODULE variable.
  2. If it's placed in the pre-command hook it requires huge structure changes or duplicating the code.
    In my opinion, we don't have enough time for this now and we have to focus on the main functionality. When pipeline generator is ready - we'll think about optimization and logic again.
    @v1v @dliappis WDYT?


cleanup() {
echo "---Terraform Cleanup"
.ci/scripts/terraform-cleanup.sh "${MODULE_DIR}" #TODO: move all docker-compose files from the .ci to .buildkite folder before switching to BK
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these functions will be run only when it's needed, for example: https://github.com/sharbuz/beats/blob/25547b4ef8c75e9c356573b39e0bc25c0bf5e022/.buildkite/scripts/cloud_tests.sh#L7
It will run when the script finishes with EXIT code, which means when it finishes successfully/unsuccessfully

.buildkite/scripts/common.sh Show resolved Hide resolved
.buildkite/hooks/pre-command Show resolved Hide resolved
trap 'teardown; unset_secrets' EXIT

# Set the MODULE env variable if possible
defineModuleFromTheChangeSet "${MODULE_DIR}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this particular function should use the beats project folder.

directory in Jenkinsfile is set in

beats/Jenkinsfile

Line 1133 in 3f46222

directory: args.project,

and

beats/Jenkinsfile

Line 1105 in 3f46222

* - project -> the name of the project that should match with the folder name.
explains what the project means.

Then directory is used in

def module = withModule ? getCommonModuleInTheChangeSet(directory) : ''

Copy link
Contributor Author

@sharbuz sharbuz Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the function and its call: 141c990 and 3cf8aa0

export BEATS_AWS_SECRET_KEY
BEATS_AWS_ACCESS_KEY=$(retry 5 vault kv get -field access_key ${AWS_SERVICE_ACCOUNT_SECRET_PATH})
export BEATS_AWS_ACCESS_KEY
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking a bit more about defineModuleFromTheChangeSet and how other places could use it. I think MODULE could be set in the pre-command hook.

Suggested change
fi
# Set the MODULE env variable
defineModuleFromTheChangeSet "${BEATS_PROJECT_NAME}"

Therefore my former suggestion for adding defineModuleFromTheChangeSet in .buildkite/scripts/cloud_tests.sh could be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it here: c96b847
And if we move that right now to the pre-command, it will require moving to pre-command all dependencies, answered here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, with this approach, the MODULE env variable won't be set in the other stages. Hence, it will take longer to run the tests.

If that's something to be done in phase2 that's totally fine with me. But PRs will be slow to be tested. Unless the defineModuleFromTheChangeSet "${BEATS_PROJECT_NAME}" is also called in all those tests using the withModule: true

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 12, 2024

💔 Build Failed

Failed CI Steps

History

cc @sharbuz

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 12, 2024

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @sharbuz

Copy link
Member

@v1v v1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned, IIUC, the current MODULE env variable won't be set hence ITs will take longer on PRs.

If that's something to be revisited in the future, then LGTM. But if build times are important for phase 1... I'd suggest to do a follow-up to set MODULE env variable in the required stages.

export BEATS_AWS_SECRET_KEY
BEATS_AWS_ACCESS_KEY=$(retry 5 vault kv get -field access_key ${AWS_SERVICE_ACCOUNT_SECRET_PATH})
export BEATS_AWS_ACCESS_KEY
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, with this approach, the MODULE env variable won't be set in the other stages. Hence, it will take longer to run the tests.

If that's something to be done in phase2 that's totally fine with me. But PRs will be slow to be tested. Unless the defineModuleFromTheChangeSet "${BEATS_PROJECT_NAME}" is also called in all those tests using the withModule: true

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @sharbuz

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @sharbuz

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @sharbuz

@sharbuz sharbuz merged commit 843010c into elastic:main Mar 12, 2024
29 of 33 checks passed
mergify bot pushed a commit that referenced this pull request Mar 12, 2024
* migrate xpack-metricbeat

(cherry picked from commit 843010c)
mergify bot pushed a commit that referenced this pull request Mar 12, 2024
* migrate xpack-metricbeat

(cherry picked from commit 843010c)

# Conflicts:
#	.buildkite/scripts/generate_packetbeat_pipeline.sh
mergify bot pushed a commit that referenced this pull request Mar 12, 2024
* migrate xpack-metricbeat

(cherry picked from commit 843010c)
sharbuz added a commit that referenced this pull request Mar 12, 2024
* migrate xpack-metricbeat

(cherry picked from commit 843010c)

Co-authored-by: sharbuz <[email protected]>
sharbuz added a commit that referenced this pull request Mar 12, 2024
* migrate xpack-metricbeat

(cherry picked from commit 843010c)

Co-authored-by: sharbuz <[email protected]>
sharbuz added a commit that referenced this pull request Mar 12, 2024
* migrate xpack-metricbeat (#38081)

* migrate xpack-metricbeat

(cherry picked from commit 843010c)

# Conflicts:
#	.buildkite/scripts/generate_packetbeat_pipeline.sh

* Update generate_packetbeat_pipeline.sh

* Update setenv.sh

---------

Co-authored-by: sharbuz <[email protected]>
@sharbuz sharbuz deleted the migrate-xpack-metricbeat branch March 12, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Enable builds in the CI for aws cloud testing backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.12.0 Automated backport with mergify backport-v8.13.0 Automated backport with mergify macOS Enable builds in the CI for darwin testing needs_team Indicates that the issue/PR needs a Team:* label x-pack Issues and pull requests for X-Pack features.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants